438 research outputs found
Detecting word substitutions in text
Searching for words on a watchlist is one way in which large-scale surveillance of communication can be done, for example in intelligence and counterterrorism settings. One obvious defense is to replace words that might attract attention to a message with other, more innocuous, words. For example, the sentence the attack will be tomorrow" might be altered to the complex will be tomorrow", since 'complex' is a word whose frequency is close to that of 'attack'. Such substitutions are readily detectable by humans since they do not make sense. We address the problem of detecting such substitutions automatically, by looking for discrepancies between words and their contexts, and using only syntactic information. We define a set of measures, each of which is quite weak, but which together produce per-sentence detection rates around 90% with false positive rates around 10%. Rules for combining persentence detection into per-message detection can reduce the false positive and false negative rates for messages to practical levels. We test the approach using sentences from the Enron email and Brown corpora, representing informal and formal text respectively
Mining Large Data Sets on Grids: Issues and Prospects
When data mining and knowledge discovery techniques must be used to analyze large amounts of data, high-performance parallel and distributed computers can help to provide better computational performance and, as a consequence, deeper and more meaningful results. Recently grids, composed of large-scale, geographically distributed platforms working together, have emerged as effective architectures for high-performance decentralized computation. It is natural to consider grids as tools for distributed data-intensive applications such as data mining, but the underlying patterns of computation and data movement in such applications are different from those of more conventional high-performance computation. These differences require a different kind of grid, or at least a grid with significantly different emphases. This paper discusses the main issues, requirements, and design approaches for the implementation of grid-based knowledge discovery systems. Furthermore, some prospects and promising research directions in datacentric and knowledge-discovery oriented grids are outlined
Wars Without Beginning or End: Violent Political Organizations and Irregular Warfare in the Sahel-Sahara
This article examines the structure and spatial patterns of violent political
organizations in the Sahel-Sahara, a region characterized by growing political
instability over the last 20 years. Drawing on a public collection of
disaggregated data, the article uses network science to represent alliances and
conflicts of 179 organizations that were involved in violent events between
1997 and 2014. To this end, we combine two spectral embedding techniques that
have previously been considered separately: one for directed graphs
(relationships are asymmetric), and one for signed graphs (relationships are
positive or negative). Our result show that groups that are net attackers are
indistinguishable at the level of their individual behavior, but clearly
separate into pro- and anti-political violence based on the groups to which
they are close. The second part of the article maps a series of 389 events
related to nine Trans-Saharan Islamist groups between 2004 and 2014. Spatial
analysis suggests that cross-border movement has intensified following the
establishment of military bases by AQIM in Mali but reveals no evidence of a
border sanctuary. Owing to the transnational nature of conflict, the article
shows that national management strategies and foreign military interventions
have profoundly affected the movement of Islamist groups
Novel Idea Generation, Collaborative Filtering, and Group Innovation Processes
Organizations that innovate encounter challenges due to the complexity and ambiguity of generating and making sense of novel ideas. Exacerbated in group settings, we describe these challenges and propose potential solutions. Specifically, we design group processes to support novel idea generation and selection, including use of a novel-information discovery (NID) tool to support creativity and brainstorming, as well as group support system and collaborative-filtering tools to support evaluation and decision making. Results indicate that the NID tool increases efficiency and effectiveness in creative tasks and that the collaborative-filtering tool can support the decision-making process by focusing the groupâs attention on ideas that might otherwise be neglected. Combining these two novel tools with group processes provides valuable contributions to both research and practice
Relational Autoencoder for Feature Extraction
Feature extraction becomes increasingly important as data grows high
dimensional. Autoencoder as a neural network based feature extraction method
achieves great success in generating abstract features of high dimensional
data. However, it fails to consider the relationships of data samples which may
affect experimental results of using original and new features. In this paper,
we propose a Relation Autoencoder model considering both data features and
their relationships. We also extend it to work with other major autoencoder
models including Sparse Autoencoder, Denoising Autoencoder and Variational
Autoencoder. The proposed relational autoencoder models are evaluated on a set
of benchmark datasets and the experimental results show that considering data
relationships can generate more robust features which achieve lower
construction loss and then lower error rate in further classification compared
to the other variants of autoencoders.Comment: IJCNN-201
Inductive Discovery Of Criminal Group Structure Using Spectral Embedding
Social network analysis has often been applied to criminal groups to understand their internal structure and dynamics. While the content of communications is often restricted by constitutional and procedural constraints, data about communications is often more readily accessible. This article applies advanced network analysis techniques based on spectral embedding to such traffic data. Spectral embedding facilitates deeper analysis by embedding the graph representing a social network in a geometric space such that Euclidean distance reflects pairwise node dissimilarity. This enables visualizing a network in ways that accurately reflect the structure of the underlying group, and computing properties directly from the embedding. We illustrate spectral approaches for two âNdrangheta drug-smuggling networks, and extend them to a) examine triad structure (through the identification of the Simmelian backbone), which elicits key members, and b) to display temporal properties, which illustrates changing group structure. Although the two groups have the same purpose and come from the same criminal milieu, they have substantially different internal structure which was not detectable using conventional social-network approaches. The techniques presented in this study may support law enforcement in the early stages of an investigation
Multiplicity Structure of the Hadronic Final State in Diffractive Deep-Inelastic Scattering at HERA
The multiplicity structure of the hadronic system X produced in
deep-inelastic processes at HERA of the type ep -> eXY, where Y is a hadronic
system with mass M_Y< 1.6 GeV and where the squared momentum transfer at the pY
vertex, t, is limited to |t|<1 GeV^2, is studied as a function of the invariant
mass M_X of the system X. Results are presented on multiplicity distributions
and multiplicity moments, rapidity spectra and forward-backward correlations in
the centre-of-mass system of X. The data are compared to results in e+e-
annihilation, fixed-target lepton-nucleon collisions, hadro-produced
diffractive final states and to non-diffractive hadron-hadron collisions. The
comparison suggests a production mechanism of virtual photon dissociation which
involves a mixture of partonic states and a significant gluon content. The data
are well described by a model, based on a QCD-Regge analysis of the diffractive
structure function, which assumes a large hard gluonic component of the
colourless exchange at low Q^2. A model with soft colour interactions is also
successful.Comment: 22 pages, 4 figures, submitted to Eur. Phys. J., error in first
submission - omitted bibliograph
- âŠ